PREDICTING THE CLASS OF WINE WITH NEURAL NETWORKS
An example of a multivariate data type classification problem using Neuroph
by Ana Nikolic, Faculty of Organisation Sciences, University of Belgrade
an experiment for Intelligent Systems course
Introduction
The goal of this experiment is to show how neural networks and Neuroph Studio are used to resolve problems of classification. We will show that with creating different architectures for solving the problem, we can found the best solution.
A classification process involves assigning objects into predefined groups or classes based on a number of observed attributes related to those objects. Although there are some more traditional tools for classification, such as certain statistical procedures, neural networks have shown to be an effective solution for this type of problems. There is a number of advantages for using neural networks - they are data driven, they are self-adaptive, they can approximate any function - linear as well as non-linear (which is quite important in this case because groups often cannot be divided by linear functions). Neural networks classify objects rather simply - they take data as input, derive rules based on those data, and make decisons.
For better understanding of our experiment, we suggest that you first look at the links below:
Neuroph Studio-Geting started
Multi Layer Perceptron
Introducing the problem
Our asigment is to train neural networks to predict witch type belongs the wine, when it is given other attributes as input. First thing we need is a data set. We’ll use a data set found on http://archive.ics.uci.edu/ml/ . The name of the data set is Wine Data Set (1991-07-01). These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. The data set contains 178 instances, and 13 attributes.
The input attributes are:
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10) Color intensity
11) Hue
12) OD280/OD315 of diluted wines
13) Proline
Each instance has one of 3 possible classes( tree types of wine).
The output attributes are:
1) 1 (first type od wine)
2) 2 (second type of wine)
3) 3 (third type of wine)
The data set can be dowloaded here, but if we want to insert it in Neuroph we need to prepare the data first. The type of neural network that will be used in this experiment is multi layer perceptron with backpropagation.
Download the data set
Prodecure of training a neural network
In order to train a neural network, there are five steps to be made:
- Preparing the data
- Create a Neuroph project
- Create a training set
- Create a neural network
- Train the network
- Test the network to make sure that it is trained properly
1.Step Preparing the data
First thing we need to do before creating our network is to prepare data for Neuroph to understand them. Because in our data set we have different data types we wil first normalize integer type of data.
We're using the standard Min Max normalization formula:
B is the standardized value
A the given value
D and C determine the range in which we want our value to be. In this case, D= 0 and C=1
The first attribute of the class l is not used this method of normalization because the attribute takes the value 1, 2 or 3 and more appropriate method would be to turn the three classes into three outputs. If an instance belongs to the first class, the first class of wine, the value of the first output will be 1, and of the second output 0 and the value of the third output will be 0. If an instance belongs to the second class, the second type of wine, the value of the second output for that instance will be 1, and the value of the first and third output will be 0. Finaly, if an instance belongs to the third class, the third type of wine, the value od the third output will be 1, and the value if the first and the second ouyput will be 0.That way, all three outputs will be values 0 or 1, which fits in our model where all the data are in the 0-1 range.
2.Step Creating a new Neuroph project
Now we need to create a new training set. First, we need to create a new Neuroph projects by clicking on the 'File' menu, and then 'New project'.
We named the project “NeurophWineProject”.
The final step is to click 'Finish' button, and new project is created. The project will be shown in the top left corner of Neuroph Studio.
3.Step Creating a training set
To create a new training set, we must do the right-clicking on our project, and choose option 'New', and than 'Training set'. We must name it, set the parameters and choose option 'Supervised' training , because we want to minimize the error of prediction through an iterative procedure. Supervised training is accomplished by giving the neural network a set of sample data along with the anticipated outputs from each of these samples. That sample data will be our data set. Supervised training is the most common way of neural network training. As supervised training proceeds, the neural network is taken through a number of iterations, until the output of the neural network matches the anticipated output, with a reasonably small rate of error. Error rate we find to be appropriate to make the network well trained is set just before the training starts. Usually, that number will be around 0.01.
In our case number of inputs will be 13, because we have 13 input attributes, and the number of outputs will be 3, because we have 3 output attributes.
Now we need to click next, and than edit the training set table. We click a 'Load from file' and select a file where is the data set from our computer. We will use 70% data form our original data set for training. We also need to select a values separator. In this case, it is tab. After finishing these steps out table will be loaded and will appear.
This is how our table looks after loading.
After finishing this, we need to start creating neural networks. We will create several neural networks, but with different sets of parameters, and determine which is the best solution for our problem by testing them. That means that we will have more training attempts.
Data set used for training
Training attempt 1
4.1 Step Creating a neural network
We will create it by right-clicking our project in the 'Projects' window, and then clicking 'New' and 'Neural Network'. We will set the name and the type of the network.We will name it ‘NewNetwork1’ and choose Multi Layer Perceptron. Multi layer perceptron is the most widely studied and used neural network classifier. It is capable of modeling complex functions, it is robust (good at ignoring irrelevant inputs and noise) ,and can adapt its weights and/or topology in response to environment changes. Another reason we will use this type of perceptron is simply because it is very easy to use - it implements black-box point of view, and can be used with few knowledge about the relationship of the function to be modeled.
Now we can click 'Next’ and than set parameters that we need for multi layer perceptron. The number of input and output neuron is the same as the number of inputs and outputs in the training set. Now we have to select the number of hidden layers, and the number of neurons in each layer. We will choose only one layer, and for the first training attempt we will choose three hidden neurons.
We have checked 'Use Bias Neurons', and chosen sigmoid transfer function (because the range of our data is 0-1, had it been -1 to 1, we would check 'Tanh'). As a learning rule we have chosen 'Backpropagation with Momentum'. This learning rule will be used in all the networks we create, because backpropagation is most commonly used technique and is most suited for this type of problem. In this method, the objects in the training set are given to the network one by one in random order and the regression coefficients are updated each time in order to make the current prediction error as small as it can be. This process continues until convergence of the regression coefficients. Also, we have chosen to add an extra term, momentum, to the standard backpropagation formulae in order to improve the efficiency of the algorithm.
When we click 'Finish', and the first neural network which we will test is completed.
If you want to see neural network as a graph, just select 'Graph View'. Right nodes in first and second level are bias neurons.
5.1 Step Training the neural network
For training the neural network we’ll use the other 30% of the original data set. We select the training set and click 'Train'. In a new window that opens, we set the learning parameters. The maximum error will also be 0.01, and learning rate 0.2 and momentum will be 0.7. Than we click 'Train' .
The graph went down after 45 iterations.
6.1 Step Testing the neural network
The test showed that total mean square is 0.04236700941165338. The goal of experiment is to construct experiments in such a way that when the observations are analyzed, the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.
Now we will test this network with several input values to see how the network will behave. We will select 5 random input values from our data set.
Those are:
Observation |
Alcohol |
Malic acid |
Ash |
Alcalinity of ash |
Magnesium |
Total phenols |
Flavanoids |
Nonflavanoid phenols |
Proanthocyanins |
Color intensity |
Hue |
OD280/OD315 |
Proline |
Desired output |
Real output |
Wine type |
Wine type |
1 |
2 |
3 |
1 |
2 |
3 |
1 |
0.5125 |
0.21 |
0.11 |
0.736842 |
0.586957 |
0.543333 |
0.536 |
0.47 |
0.48 |
0.215 |
0.565 |
0.733333 |
0.393723 |
1 |
0 |
0 |
0.034351 |
0.936341 |
0.004308 |
11 |
0.62 |
0.162 |
0.705 |
0.5 |
0.326087 |
0.566667 |
0.596 |
0.26 |
0.465 |
0.341667 |
0.52 |
0.823333 |
0.457917 |
1 |
0 |
0 |
0.363275 |
0.624203 |
0.023632 |
41 |
0.27 |
0.166 |
0.66 |
0.394737 |
0.119565 |
0.2 |
0.3 |
0.52 |
0.41 |
0.116667 |
0.54 |
0.423333 |
0.14408 |
0 |
1 |
0 |
0.008792 |
0.977752 |
0.014986 |
70 |
0.25 |
0.486 |
0.5 |
0.421053 |
0.184783 |
0.333333 |
0.328 |
0.37 |
0.4675 |
0.023333 |
0.465 |
0.683333 |
0.203994 |
0 |
1 |
0 |
0.008830 |
0.977756 |
0.014964 |
82 |
0.47 |
0.398 |
0.7 |
0.473684 |
0.369565 |
0.1 |
0.244 |
0.24 |
0.2075 |
0.366667 |
0.37 |
0.14 |
0.179743 |
0 |
0 |
1 |
0.008308 |
0.011002 |
0.986591 |
We can see that the neural network guessed only three of five instance, and even the Total Mean Square Error is acceptable, we will continue with finding a better solution.
Training attempt 2
4.2. Step Creating the network
In the second attempt we will create a new neural network, called ‘NewNetwork2’ and the number of hidden neurons will now be 8.
The neural network graph looks like this:
5.2. Step Training the network
For training the neural network we’ll also use the other 30% of the original data set. We select the training set and click 'Train'. In a new window that opens, we will set the learning parameters. The maximum error will be 0.01, and learning rate 0.6 and momentum will be 0.9. Than we click 'Train' .
The graph went down after 36 iterations and we can conclude that network quickly learned the data set.
6.2 Step Testing the neural network
After testing the network we can see that the Total Mean Square Error is 0.030780706500805676, less than before, and conclude that this network is better solution than the previous one.
Like before, we will see how network react with the 5 inputs we random chose .
Observation |
Alcohol |
Malic acid |
Ash |
Alcalinity of ash |
Magnesium |
Total phenols |
Flavanoids |
Nonflavanoid phenols |
Proanthocyanins |
Color intensity |
Hue |
OD280/OD315 |
Proline |
Desired output |
Real output |
Wine type |
Wine type |
1 |
2 |
3 |
1 |
2 |
3 |
1 |
0.5125 |
0.21 |
0.11 |
0.736842 |
0.586957 |
0.543333 |
0.536 |
0.47 |
0.48 |
0.215 |
0.565 |
0.733333 |
0.393723 |
1 |
0 |
0 |
0.784902 |
0.113840 |
0.025097 |
11 |
0.62 |
0.162 |
0.705 |
0.5 |
0.326087 |
0.566667 |
0.596 |
0.26 |
0.465 |
0.341667 |
0.52 |
0.823333 |
0.457917 |
1 |
0 |
0 |
0.889018 |
0.110322 |
0.009990 |
41 |
0.27 |
0.166 |
0.66 |
0.394737 |
0.119565 |
0.2 |
0.3 |
0.52 |
0.41 |
0.116667 |
0.54 |
0.423333 |
0.14408 |
0 |
1 |
0 |
0.043273 |
0.994508 |
0.057837 |
70 |
0.25 |
0.486 |
0.5 |
0.421053 |
0.184783 |
0.333333 |
0.328 |
0.37 |
0.4675 |
0.023333 |
0.465 |
0.683333 |
0.203994 |
0 |
1 |
0 |
0.001986 |
0.996112 |
0.017252 |
82 |
0.47 |
0.398 |
0.7 |
0.473684 |
0.369565 |
0.1 |
0.244 |
0.24 |
0.2075 |
0.366667 |
0.37 |
0.14 |
0.179743 |
0 |
0 |
1 |
0.002695 |
0.006183 |
0.995752 |
The network guessed correct in all five instances. After this test, we can conclude that this solution does not need to be rejected.
Training attempt 3
4.3 Step Creating the network
In the next attempt we will create a neural network, called ‘NewNetwork4’ and we will set the number of hidden neurons to 11 .
The neural network graph looks like this:
5.3 Step Train the network
For training the neural network we will select the training set and click 'Train'. In a new window that opens, we will set the learning parameters. The maximum error will be 0.01, and learning rate 0.2 and momentum will be 0.7. Than we click 'Train' .
The graph went down already after 34 iterations and we can conclude that network very quickly learned the data set.
6.3 Step Testing the neural network
The Total Mean Square Error now is 0.030780706500805676, and we can say that this network is not a better solution than the previous one.
Like before, we will see how network react with the 5 inputs we random chose .
Observation |
Alcohol |
Malic acid |
Ash |
Alcalinity of ash |
Magnesium |
Total phenols |
Flavanoids |
Nonflavanoid phenols |
Proanthocyanins |
Color intensity |
Hue |
OD280/OD315 |
Proline |
Desired output |
Real output |
Wine type |
Wine type |
1 |
2 |
3 |
1 |
2 |
3 |
1 |
0.5125 |
0.21 |
0.11 |
0.736842 |
0.586957 |
0.543333 |
0.536 |
0.47 |
0.48 |
0.215 |
0.565 |
0.733333 |
0.393723 |
1 |
0 |
0 |
0.774011 |
0.146359 |
0.094840 |
11 |
0.62 |
0.162 |
0.705 |
0.5 |
0.326087 |
0.566667 |
0.596 |
0.26 |
0.465 |
0.341667 |
0.52 |
0.823333 |
0.457917 |
1 |
0 |
0 |
0.885433 |
0.026284 |
0.092553 |
41 |
0.27 |
0.166 |
0.66 |
0.394737 |
0.119565 |
0.2 |
0.3 |
0.52 |
0.41 |
0.116667 |
0.54 |
0.423333 |
0.14408 |
0 |
1 |
0 |
0.042780 |
0.923561 |
0.049359 |
70 |
0.25 |
0.486 |
0.5 |
0.421053 |
0.184783 |
0.333333 |
0.328 |
0.37 |
0.4675 |
0.023333 |
0.465 |
0.683333 |
0.203994 |
0 |
1 |
0 |
0.007108 |
0.987970 |
0.011262 |
82 |
0.47 |
0.398 |
0.7 |
0.473684 |
0.369565 |
0.1 |
0.244 |
0.24 |
0.2075 |
0.366667 |
0.37 |
0.14 |
0.179743 |
0 |
0 |
1 |
0.002658 |
0.003481 |
0.997891 |
Again, the network guessed all five instances, and the solution is acceptable.
Training attempt 4
5.4 Step Train the network
We will train again the previous neural network that we created, called 'NewNetvork4' but we will change maximum error, learning rate and momentum to 0.01, 0.6, 0.9. Than, we click 'Train' and waiting to see what happens.
We can see in the picture that train was successful and that network learned the exemple after 70 iterations.
6.4 Step Testing the network
Total Mean Square Error after testing NewTest1 training set, is 0.021718430058843522. It is also a small and acceptable error.
We can see that the same network but with the different parametars give the better result.
Training attempt 5
5.5 Step Training the network
We will now try to re-train the same network, but only decrease the learning rate on 0.3. Momentum remains the same, 0.9, as in the previous session.
We can see the big difference betwean the last two trainings. Now the network learn the exemple quicklier, after 40 iterations.
6.5 Step Testing the network
And the Total Mean Square Error is little bigger than before, but still acceptable, 0.03183341152026723.
Training attempt 6
4.6 Step Creating the network
Following these rules, we now decide for a neural network that contains 10 hidden neurons in one hidden layer. Again, we type in the standard number of inputs and outputs, check 'Use Bias Neurons', choose a Sigmoid Transfer function, and select 'Backpropagation with Momentum' as the Learning rule.
Now, we will choose ten hidden neurons.
Graphical representation of neural network looks like this:
5.6 Step Training the network
Like the previous neural network, we will train this one with the training set we created before, with the entire sample. We select 'NewTest1', click 'Train' and a new window appears, asking us to fill in the parameters. We will set the maximum error to be 0.01, the learning rate will be 0.4, and momentum 0.7. After we click 'Train', the iteration process starts. We can see on the picture that the iteration process stops at 35 iteration.
6.6 Testing the Neural Network
This time, we didn’t get the better result than before, and the Total Mean Square Error now is 0.05860080947293679.
Even the the Total Mean Square Error isn’t the best we got we’ll test network with 5 random inputs we chose before.
Observation |
Alcohol |
Malic acid |
Ash |
Alcalinity of ash |
Magnesium |
Total phenols |
Flavanoids |
Nonflavanoid phenols |
Proanthocyanins |
Color intensity |
Hue |
OD280/OD315 |
Proline |
Desired output |
Real output |
Wine type |
Wine type |
1 |
2 |
3 |
1 |
2 |
3 |
1 |
0.5125 |
0.21 |
0.11 |
0.736842 |
0.586957 |
0.543333 |
0.536 |
0.47 |
0.48 |
0.215 |
0.565 |
0.733333 |
0.393723 |
1 |
0 |
0 |
0.186696 |
0.701852 |
0.126742 |
11 |
0.62 |
0.162 |
0.705 |
0.5 |
0.326087 |
0.566667 |
0.596 |
0.26 |
0.465 |
0.341667 |
0.52 |
0.823333 |
0.457917 |
1 |
0 |
0 |
0.554927 |
0.511543 |
0.004250 |
41 |
0.27 |
0.166 |
0.66 |
0.394737 |
0.119565 |
0.2 |
0.3 |
0.52 |
0.41 |
0.116667 |
0.54 |
0.423333 |
0.14408 |
0 |
1 |
0 |
0.001405 |
0.956494 |
0.004961 |
70 |
0.25 |
0.486 |
0.5 |
0.421053 |
0.184783 |
0.333333 |
0.328 |
0.37 |
0.4675 |
0.023333 |
0.465 |
0.683333 |
0.203994 |
0 |
1 |
0 |
0.010308 |
0.976310 |
0.019552 |
82 |
0.47 |
0.398 |
0.7 |
0.473684 |
0.369565 |
0.1 |
0.244 |
0.24 |
0.2075 |
0.366667 |
0.37 |
0.14 |
0.179743 |
0 |
0 |
1 |
0.007985 |
0.002473 |
0.999162 |
The network guessed correct four of five instances. After this test, we can conclude that this solution does not need to be rejected.
Training attempt 7
5.7 Step Training the network
In order to reduce some errors we will re-train the same network but with different training parametars. We will set maximum error, learning rate and momentum to 0.01, 0.2, 0.7 to see what happens.
We can see in the picture that train was successful and that network learned the exemple after 36 iterations.
6.7 Step Testing the Neural Network
This time, we got better result than before, and the Total Mean Square Error now is 0.02841794491082484.
Training attempt 8
6.7 Step Creating the Neural Network
In this attempt we will try to use 9 hidden neuones in one hidden layer to see how the network will react in order to reduce errors as much as possible. First, we will create a new network called NewNetwork6. When we click 'Finish', the neural network is completed. If you want to see neural network as a graph, just select 'Graph View'.
6.7 Step Training the Neural Network
We will select the training set and after clicking 'Train', we will set the learning parameters. The maximum error will be 0.02, and learning rate 0.2 and momentum will be 0.7.
The graph went down to the horizontal asymptote, immediately after 33 iterations. This means that the neural network quickly learned 70 percent of date set.
6.7 Step Testing the Neural Network
The goal of experiment is to construct experiments in such a way that the mean square error is close to zero relative to the magnitude of at least one of the estimated treatment effects.With this network Total Mean Square Error is 0.02676905451038841 and this is the best result we got.
The final part of testing this network is testing it with several input values.
Observation |
Alcohol |
Malic acid |
Ash |
Alcalinity of ash |
Magnesium |
Total phenols |
Flavanoids |
Nonflavanoid phenols |
Proanthocyanins |
Color intensity |
Hue |
OD280/OD315 |
Proline |
Desired output |
Real output |
Wine type |
Wine type |
1 |
2 |
3 |
1 |
2 |
3 |
1 |
0.5125 |
0.21 |
0.11 |
0.736842 |
0.586957 |
0.543333 |
0.536 |
0.47 |
0.48 |
0.215 |
0.565 |
0.733333 |
0.393723 |
1 |
0 |
0 |
0.761839 |
0.131143 |
0.016788 |
11 |
0.62 |
0.162 |
0.705 |
0.5 |
0.326087 |
0.566667 |
0.596 |
0.26 |
0.465 |
0.341667 |
0.52 |
0.823333 |
0.457917 |
1 |
0 |
0 |
0.876078 |
0.143904 |
0.007838 |
41 |
0.27 |
0.166 |
0.66 |
0.394737 |
0.119565 |
0.2 |
0.3 |
0.52 |
0.41 |
0.116667 |
0.54 |
0.423333 |
0.14408 |
0 |
1 |
0 |
0.001084 |
0.986615 |
0.072881 |
70 |
0.25 |
0.486 |
0.5 |
0.421053 |
0.184783 |
0.333333 |
0.328 |
0.37 |
0.4675 |
0.023333 |
0.465 |
0.683333 |
0.203994 |
0 |
1 |
0 |
0.004876 |
0.988745 |
0.012079 |
82 |
0.47 |
0.398 |
0.7 |
0.473684 |
0.369565 |
0.1 |
0.244 |
0.24 |
0.2075 |
0.366667 |
0.37 |
0.14 |
0.179743 |
0 |
0 |
1 |
0.002464 |
0.002975 |
0.996511 |
We can conclude that network guessed correct in all five instances.
Conclusion
As a conclusion we will show a table that summarizes all the training attempts and we will see witch the best solutions for this problem are. We can see form the table that the number of hidden neurons is crucial to the effectiveness of a neural network. Also, the experiment showed that the success of a neural network is very sensitive to parameters chosen in the training process. The learning rate must not be too high, and the maximum error must not be too low. The results have shown that the total mean square error does not reflect directly the success of a network.
Training attempt |
Number of hidden neurons |
Number of hidden layers |
Training set |
Testing set |
Maximum error |
Learning rate |
Momentum |
Total mean square error |
Number of iterations |
5 random inputs test - number of correct guesses |
Network trained |
1 |
1 |
1 |
70% |
30% |
0.01 |
0.2 |
0.7 |
0.1992 |
21 |
3/5 |
yes |
2 |
3 |
1 |
70% |
30% |
0.01 |
0.2 |
0.7 |
0.0423 |
45 |
3/5 |
yes |
3 |
8 |
1 |
70% |
30% |
0.01 |
0.6 |
0.9 |
0.0307 |
38 |
5/5 |
yes |
4 |
11 |
1 |
70% |
30% |
0.01 |
0.2 |
0.7 |
0.0491 |
34 |
5/5 |
yes |
5 |
11 |
1 |
70% |
30% |
0.01 |
0.6 |
0.9 |
0.0217 |
70 |
/ |
no |
6 |
11 |
1 |
70% |
30% |
0.01 |
0.3 |
0.9 |
0.0218 |
34 |
/ |
no |
7 |
10 |
1 |
70% |
30% |
0.02 |
0.4 |
0.7 |
0.0586 |
37 |
3/5 |
yes |
8 |
10 |
1 |
70% |
30% |
0.01 |
0.2 |
0.7 |
0.0284 |
36 |
/ |
no |
9 |
9 |
1 |
70% |
30% |
0.01 |
0.2 |
0.7 |
0.0267 |
33 |
5/5 |
yes |
8 |
9 |
1 |
80% |
20% |
0.01 |
0.2 |
0.7 |
0.0161 |
72 |
/ |
yes |
9 |
5 |
1 |
80% |
20% |
0.01 |
0.4 |
0.6 |
0.1930 |
14 |
/ |
yes |
10 |
5 |
1 |
80% |
20% |
0.01 |
0.2 |
0.05 |
0.1850 |
47 |
/ |
no |
11 |
9 |
1 |
90% |
10% |
0.01 |
0.4 |
0.6 |
0.0256 |
33 |
/ |
yes |
12 |
5 |
1 |
90% |
10% |
0.01 |
0.2 |
0.7 |
0.1834 |
12 |
/ |
no |
DOWNLOAD
See also:
Multi Layer Perceptron Tutorial
|